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Abstract 

We show that the cost of market orders and the profit of infinitesimal 
market-making or -taking strategies can be expressed in terms of directly 
observable quantities, namely the spread and the lag-dependent impact 
function. Imposing that any market taking or liquidity providing strate- 
gies is at best marginally profitable, we obtain a linear relation between 
the bid-ask spread and the instantaneous impact of market orders, in good 
agreement with our empirical observations on electronic markets. We then 
use this relation to justify a strong, and hitherto unnoticed, empirical cor- 
relation between the spread and the volatility per trade, with R^s exceeding 
0.9. This correlation suggests both that the main determinant of the bid- 
ask spread is adverse selection, and that most of the volatility comes from 
trade impact. We argue that the role of the time-horizon appearing in the 
definition of costs is crucial and that long-range correlations in the order 
flow, overlooked in previous studies, must be carefully factored in. We find 
that the spread is significantly larger on the NYSE, a liquid market with 
specialists, where monopoly rents appear to be present. 

1 Introduction and review of the literature 

One of the most important attribute of financial markets is to provide immediate 
liquidity to investors yj, who are able to convert cash into stocks and vice- versa 
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nearly instantaneously whenever they choose to do so. Of course, some markets 
are more liquid than others and the liquidity of a given market varies in time 
and can in fact dramatically dry up in crisis situations. How should markets 
be organized, at the micro-structural level, to optimize liquidity, to favor steady 
and orderly trading and avoid these liquidity crises? In the past, the burden 
of providing liquidity was given to "market makers" (or specialists). In order 
to ensure steady trading, the specialists alternatively sell to buyers and buy to 
sellers, and get compensated by the so-called bid-ask spread - i.e. the price at 
which they sell to the crowd is always slightly larger than the price at which they 
buy. The determinants of the value of the spread in specialists markets have been 
the subject of many studies in the economics literature [2l[3llll[5l[6l[71[8l[9l [TO] , 
and [11] for a recent review. 

However, most financial markets have nowadays become fully electronic (with 
the notable exception of the New- York Stock Exchange, NYSE - although this will 
soon change). In these markets, liquidity is self-organized, in the sense that any 
agent can choose, at any instant of time, to either provide liquidity or consume 
liquidity. More precisely, any agent can provide liquidity by posting limit orders: 
these are propositions to sell (or buy) a certain volume of shares or lots at a 
fixed minimum (maximum) price. Limit orders are stored in the order-book. At 
a given instant in time, the best offer on the sell side (the 'ask') is higher than 
the best price on the buy side (the 'bid') so no transaction takes place. For a 
transaction to occur, an agent must consume liquidity by issuing a market order 
to buy (or to sell) a certain number of shares; the transaction occurs at the best 
available price, provided the volume in the order book at that price is enough to 
absorb the incoming market order. Otherwise, the price 'walks up' (or down) the 
ladder of offers in the order book, until the order is fully satisfied. The liquidity of 
the market is partially characterized by the bid-ask spread S, which sets the cost 
of an instantaneous round-trip of one share (a buy instantaneously followed by a 
sell, or vice versa) |l] A liquid market is such that this cost is small. A question 
of both theoretical and practical crucial importance is to know what fixes the 
magnitude of the spread in the self-organized set-up of electronic markets, and 
the relative merit of limit vs. market orders. In the present work, we argue 
that on electronic markets, profitable high frequency strategies using either limit 
or market orders should not exist, imposing a linear relation between the bid- 
ask spread S and the average impact of market orders. This, in turn justifies a 
simple, but hitherto unnoticed, proportionahty relation between the spread and 
the volatility per trade. 

In a large fraction of the economics literature [21 El IH [5] , liquidity providers 
are described as market makers who earn their profit from the spread. The value 
of the spread is non zero because this market making strategy has costs. Three 

^ Other determinants of liquidity discussed in the hterature are the depth of the order book 
and market resihency, see [12l [E] . 
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types of cost are discussed in the literature [TTj: 

• (i) order processing costs (which includes sheer profit for the market maker); 

• (ii) adverse selection costs: liquidity takers may have superior information 
on the future price of the stock, in which case the market maker loses 
money 1^ 

• (iii) inventory risk: market makers may temporarily accumulate large long 
or short positions which are risky. If agents are risk-sensitive and have to 
limit their exposure, this adds extra-costs. 

Theoretical models that account for these costs typically introduce a rather large 
amount of free parameters (such as risk-aversion, fraction of informed trades, 
fraction of patient/ impatient traders, etc.) most of which cannot be measured 
directly. In order to extract the different determinants of the spread from em- 
pirical data, some drastic assumptions must be made. For example, assuming 
the order flow to be short-ranged correlated, Huang and StoU [8J find (using data 
from 1992) that 90% of the spread is associated to order processing costs, and 
not to adverse selectioijfl. This is would mean rather comfortable profits for mar- 
ket making, and is a somewhat surprising conclusion since the spread on purely 
electronic markets is found comparable to the spread in markets with specialists. 
A related approach is that of [9], where the ratio of adverse selection to process- 
ing costs was estimated to be in the range 35 — 50% on the NYSE in 1990 (see 
also [IT] for similar numbers). We will review this theoretical framework below 
and detail the similarities and differences with our own analysis; one particularly 
crucial difference is the assumption that the order imbalance has short-ranged 
correlations ^ [9] , and therefore that market impact of a single trade is perma- 
nent, in striking disagreement with empirical data, where the order flow is instead 
found to be a long-memory process fTEl [TB] , and single trade impact transient, 
but decaying very slowly [151 [17]. The long-range correlation between trades, and 
the corresponding temporal dependence of market impact will turn out to play 
an important role in the following discussion. 

On general grounds, both adverse selection and inventory risk imply a positive 
correlation between the spread and the volatility of the traded asset. This makes 
perfect intuitive sense, and the aim of the present paper is to clarify in detail 
the origin of this relation. Positive correlation between spread and volatility is 
indeed documented empirically (see e.g. [TSt lil fT9| [TH [20| [2H [22| [23]). but is not 

^This is also discussed as the free option trading problem in the literature, see e.g. [14| and 
refs. therein. 

■^Adverse selection is even found to have, within this framework, a negative contribution to 
the spread! 

^Direct processing costs can be estimated to be at least ten times smaller than the spread, 
in particular on electronic markets. 
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particularly spectacular and stands as one among other reported correlations, 
e.g., with traded volume, flow of limit orders, market capitalization, etc.fllj. 
Here, we want to argue theoretically, and demonstrate empirically on different 
markets, that there is in fact a very strong correlation between the spread and the 
volatility per trade, rather than with the volatility per unit time. Such a strong 
relation was first noted on the case of France- Telecom [15] , and independently on 
the stocks of the ftse-100 [21], but no theoretical argument was given in favor 
of this relation. 

From a theoretical point of view, several statistical models of limit and mar- 
ket order flows have been analyzed to understand the distribution of the bid-ask 
spread, and relate its average value to flow and cancellation rates [Sj [251 ISSl [271 
[29l [28| [30l [31] . Some models include strategic considerations in order placement 
and look for a trade-off between the cost of delayed execution and that of imme- 
diacy, but suppose that the price dynamics is bounded in a finite interval [2S|, 
therefore neglecting the long term volatility of the price (see also [501 EI])- As 
such, these finite band models have nothing to say about the spread-volatility 
relationship. Another line of models discards all strategic components ("Zero 
intelligence models") and assume Poisson rates for limit orders, market orders 
and cancellation [27l [29l [28] El One can then compute both the average bid-ask 
spread and the long-term volatility as a function of these Poisson rates, and com- 
pare these predictions with empirical data [32J. The problem with such models 
is that although the order flow itself is completely random, the persistence of the 
order book leads to strong non-diffusive short term predictability of the price, 
which would be very easily picked off by high frequency automated execution 
machines. These programs search to optimize execution costs (see e.g. [TUl [55] ) 
by adequately conditioning the order flow (proportion of limit and market orders, 
timing, aggressivity) and use any short-term predictability to do so. As a result 
there are in fact very strong high frequency correlations in the order flow, coming 
from the 'hide and seek' game played by buyers and sellers within the order book 
[T^ [TB[ [Mj . A key observation is that for small tick stocks, the total available 
volume in the order book at any instant in time is in fact extremely small, on the 
order of 10~^ — 10~^ of the market capitalization, or 10^^ — 10~^ of the daily vol- 
ume (see Table 2 in Appendix 2). Clearly, the reason for such a small outstanding 
liquidity is that liquidity providers want to avoid giving a free trading option to 
informed traders. As a consequence, liquidity takers must cut their total order 
in small chunks; this creates the long term correlation in order flow [5Bj. But 
since on electronic markets sophisticated buyers and sellers can trade using at 
their best convenience either limit or market orders, the average cost of limit and 
market orders should be very similar. If - say - market orders were on average 
significantly more expensive than limit orders, more limit orders would be issued, 
thereby reducing the spread and the cost of market orders, until an equilibrium 

^More elaborated 'weak intelligence' models have been studied recently, see [33] , 
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is reached|§ That a competitive ecology between limit and market orders should 
exist on order-driven markets was emphasized in [38l [151 [13 • However, as our 
analysis reveals, this ecology turns out to be considerably more intricate than 
anticipated by Handa & Schwartz ^8J. 

In the following, we introduce the idea of infinitesimal strategies, participating 
to a vanishing fraction of market or limit orders. Imposing that such strategies 
lead at best to marginal profits motivates a linear relation between the instan- 
taneous price impact of a market order and the bid-ask spread, which we check 
empirically. Interestingly, we find that the profitability of these strategies de- 
pend in a non trivial way on the time horizon over which they are implemented. 
We show in particular that fast market making strategies can be profitable even 
though the long-term average cost of limit orders is positive, a rather paradoxical 
situation brought about by the presence of long-range correlations in the order 
flow and the temporal structure of the impact function. 

The linear relation between spread and impact in turn allows us to establish 
a proportionality relation between the spread and volatility per trade, which 
holds both across different stocks and for a given stock across time, on electronic 
markets and on the NYSE. This result shows that in a competitive electronic 
market the bid-ask spread in fact mostly comes from "adverse selection" , provided 
one extends this notion to account for the fact that trades can be uninformed but 
still impact the price. What is relevant here is that any unexpected component 
of the market order flow, whether it is truly informed or just random, impacts 
the price and creates a cost for limit orders, which must be compensated by the 
spread, as we now explain in detail. 

2 Limit orders vs market orders and market im- 
pact 

2.1 A simple theoretical framework 

We start by reviewing the theoretical framework proposed by Madhavan, Richard- 
son and Roomans (mrr) in [9], which helps define various quantities and hone 
in on relevant questions. We will call f « the volume of the ith market order, and 
ei the sign of that market order (e = +1 for a buy and e = — 1 for a sell). The 
assumptions of the model are (i) that all trades have the same volume Vi = v and 
(ii) the ej's are generated by a Markov process with correlation p, which means 
that the average value of conditioned on the past only depends on ej_i and is 
given by: 

(ei)L,_i =pei_i, (1) 

^Data from brokers VWAP machines indeed show that the fraction of issued market orders 
is close to 50%. 
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where (...) denotes averaging. The case p = corresponds to independent trade 
signs, whereas p > describes positive autocorrelations of trades. Note that in 
this model, correlations decay exponentially: 

= (e,e,+,) = (2) 

The MRR model assumes that the 'true' price pi evolves both because of random 
external shocks (or news) and because of trade impact. It is natural to postulate 
that both external news and surprise in order flow should move the price. Since 
the surprise at the ith trade is given by e-i — pej_i, MRR write the following 
evolution equation for the price: 

Pi+i -Pi = 6 + - pci-i], (3) 

where ^ is the shock component, with variance = S^, and 6 measures trade 
impact, assumed to be constant (all trades are assumed to have the same volume). 
Since market makers cannot guess the surprise of the next trade, they post a bid 
price bi and an ask price given by: 

tti = Pi + 9[1 - pti-i] + 4>; bi = Pi + 9[-l - pti-i] ~ 4>, (4) 

where is the extra compensation claimed the market maker, covering processing 
costs and the shock component risk. The above rule ensures no ex-post regrets 
for the market maker. The spread is therefore S = a — b = 2{6 + (p) , whereas the 
midpoint m = {a + b)/2 immediately before the ith trade is given by: 

mi=pi- dpti-i. (5) 

These equations allow to compute several important quantities for the following 
discussion, although not explicitely considered by MRR. The first one is the lagged 
impact function introduced in [151 [H] • 

Til = {ei ■ {me+i -rrii)), (6) 

which is found, within the MRR model, to increase from TZi = 6{1 — p) to T^oo = 6' 
(See Appendix 1 and Fig. 1). Due to correlations between trades, the long time 
impact is therefore enhanced compared to the short term impact by a factor: 

A. = (7) 

where Ci = C{i = 1) = p in the MRR model, but the above relation is more 
general (see Appendix I). 

The second quantity is the mid-point volatility, defined as: 

c^e =l{{me+i-mif), (8) 
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which is easily computed to bejil 

al = nl + T.', a^ = i-^7^^ + S^ = S^. (9) 

Within the above interpretation, the mrr model leads to the following simple 
relations between spread, impact and volatility per trade: 

S = 2Aoo7^l + 20 al = nl + S^, (10) 

relations which we generalize and test empirically in the following. From the 
data presented in mrr, one observes that was rather large on the NYSE in 
1990: 0/Aoo7^l -1-2. 

Note that in the simplest case of independent trade signs (p = 0), the impact 
function is time independent. In the absence of extra compensation for the market 
makers, = and the above equation reduces to TZi = S/2. In economical terms, 
this last equality has a very simple meaning: it indicates that on average, the new 
mid-price after the transaction m^+i = rrii + eiTZ is equal to the last transaction 
price rrii + eiS/2, and therefore that TZi = S/2 is precisely the condition where 
both market orders and limit orders have zero ex-post cost. This is more generally 
the meaning of the mrr relation, Eq. (fTOj) : the transaction price is exactly equal 
to the expected long term value of the mid-point. 

It is interesting to discuss the cost of limit orders Cl slightly differently. Sup- 
pose one wants to trade at a random instant in time. Compared to the initial 
mid-point value, the average execution cost of an infinitesimal buy limit order is 
given by: 

with probability 1/2, the order is executed right away, S/2 below the mid-point; 
otherwise, the mid-point moves on average by a quantity TZi, to which must be 
added the cost of a limit order conditioned to the last trade being a buy, C^, for 
which a similar equation can be obtained: 

ci = ^f4Ui^('^r+cr), (12) 



with obvious notations. Since the mrr model is Markovian, one has TZf = TZi 
and Cf^'^ = Cl^ so that: 

Plugging this last relation in Eq. (ITTl) . we finally find: 

CL = -f + ^7^l. (14) 
z 1 — p 



^The is an extra contribution to af coming from any high-frequency noise component that 
we neglect here, coming from decimalisation, small volumes at bid/ask, etc. See 0111] and 
footnote 18 below. 
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Imposing that Cl = 0, one recovers the MRR relation between the spread and the 
asymptotic impact (Eq. (fTOl) with = 0). Note, however, the cost of a market 
order compared to the initial mid-point value is S/2 within the MRR model - but 
of course the order is still executed at the 'right' long term value of the stock. 

2.2 Real markets are more complicated 

The above model, although suggestive and capturing the essence of the correlation 
between spread, impact and volatility, is however not fully satisfactory since it 
completely neglects the very broad distribution of traded volumes (often found to 
be log-normal, or power-law tailed) and, more importantly, the non-Markovian, 
long ranged correlation of the trade signs, which is found to decay as [151 [IS] 



instead of the fast, exponential decay assumed in the MRR model. Because the ex- 
ponent 7 is found to be less than unity, the correlation function is not integrable, 
which technically makes the series of trade signs a long-memory process. As 
emphasized in p!5l [T7] , this imposes a number of non-trivial constraints on price 
impact for the returns to remain uncorrelated while the order flow is strongly 
auto-correlated. In particular, simple models (such as Huang and StoU's [81 ITT]) 
where price changes include a term proportional to would lead to strong super- 
diffusion (trends) of prices on the long run [15j, in disagreement with empirical 
data. 

The volume-dependent lagged impact is now defined asj^ 



In the MRR model, v takes a single value and IZ^iy) reduces to the previously 
defined quantity. The function 7le{v) was studied in detail in [15]. To a good level 
of approximation, the following factorization property is found to hold: TZ^{v) ~ 
R{i)f{v), where f{v) is a strongly concave function, and R{i) an increasing 
function of i that varies by a factor of ~ 2 when i increases from 1 to several 
thousands (corresponding to a few days of trading) The shape of R{i), averaged 
over a collection of different stocks of the PSE, is shown in Fig. 1, and compared 
with the simple form assumed in the MRR model (see caption for more details). 
Perhaps more importantly, the enhancement factor Aoo is found empirically to be 

^These long ranged correlations were also noted in e.g. [1TJ[37], but the detailed shape of 
the tail of C{£) was not investigated, and its long-memory nature not discussed. 

^In the definition of TZ, care has been taken to remove any long term trend of the mid-point. 
In any case, since (e) is close to zero, this trend contribution would very nearly vanish. 

^°The true asymptotic behaviour of R{£) for longer horizons is difficult to determine empiri- 
cally due to statistical noise, and might in fact be stock dependent, see [T7] for a discussion of 
this point. 



C(£) = (e,e,+,) ^ 7<1 



(15) 




(16) 
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substantially larger than predicted by Eq. ([7]). For example, on the pool of 68 
PSE stocks studied below, we find, averaged over all stocks, Aqo ~ 1-75 whereas 
1/(1 — Ci) ~ 1.32 (see Table 2 of Appendix 2). The difference between the two 
will turn out to play a crucial role in the following. 



1.8 




Figure 1: Average over 68 pse stocks of the impact function R{i) as a function of 
i (plain line). The average is performed by rescaling the individual R{i) such that 
R{£ = 1) = 1, and by rescaling £ by the average daily number of trades and multiplying 
by 100. Dotted line: prediction of the mrr model with p = 3/7, such that Aoo = 1-75. 
The discrepancy with empirical data shows the importance of correctly accounting for 
long-range correlations in order flow. 



2.3 Market order strategies 

In this section and below, we want to show how the simple relations derived in 
the MRR model can be extended and tested in the general case of fluctuating 
volumes and long-ranged correlation of trade signs. A flrst idea is to measure 
empirically the average execution cost of market-orders. One can define the ex- 
post cost Cm(T) the difference between the transaction price at (trade-)time i 
and the mid-point price at time i + T later, with T ^ 1 but still much smaller 
than the typical horizon of the trading strategy itself (a few days or more), in 
order not to mix in the quality of the decision to trade. The above definition of 
execution cost marks the trade to market after T and is referred to the realized 
spread in the literature [HI |1T]. The volume weighted averaged cost (over 
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trades) of a single market order over horizon T is therefore^ 

Cm(T) = ^ E ^Mm. + 6.- - m.^r) = (17) 

The choice T ^ 1 allows us to use the asymptotic value of R, R{i ^ 1) ~ 
Aoo-R(l), where we have introduced a factor Aqo in conformity with the notation 
of the previous section. Using the factorization property of Tle{v), we finally 
obtain for the average cost of a single market order: 

Cm{T » 1) _ ^ - Aoo^^, (18) 

meaning, as intuitively clear, that this cost is positive when spreads are large, 
but may become negative if the total price impact Aoo^^i is large. In the plane 
X = {vTZiiy)) / (v), y = (vS) / (v) (which will repeatedly be used below to represent 
empirical data) the condition C(T :§> 1) = defines a straight line of slope 2Aoo 
separating an upper region where market orders are on average costly, from a 
region where single market orders are favored: see Fig. 2. 

The above computation suggests an upper bound on the spread, which we 
establish more rigorously in the next section. For larger spreads, the positive 
average cost of market order would deter their use; limit orders would then pile up 
and reduce the spread. What would happen if the spread was below the red line of 
slope 2Aoo in Fig. 2? Naively, market orders have a negative cost in that region, 
and one might be able to devise profitable strategies based solely on market 
orders. The idea would be to try to benefit from the impact term TZoo in the 
above balance equation. The growth of TZi ultimately comes from the correlation 
between trades, i.e. the succession of buy (sell) trades that typically follow a 
given buy (sell) market order. The simplest 'copy-cat' strategy which one can 
rigorously test on empirical data is to place a market order with vanishing volume 
fraction (not to affect the subsequent history of quotes and trades), immediately 
following another market order. This strategy suffers on average from the impact 
of the initial trade, used as a guide to guess the direction of the market. Therefore, 
the profit Qcc of such a copy-cat strategy, marked to market after a long time 
and neglecting further unwinding costs, is reduced toj^l 

^c^^=[^oo-l]^^-^. (19) 



"'^"'^Note that this definition neglects the fact that one single large market order may trigger 
transactions at several different prices, up the order book ladder, and pay more than the nominal 
spread. Nevertheless this situation is empirically quite rare on the markets we are concerned 
with, and corresponds to only a few percents of all cases |32] . 

^^A more rigorous estimate of the gain of a copy-cat strategy participating to all the trades 
can be obtained following the method outlined in the next section. 
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Imposing that this gain is non-positive, one obtains a lower hne in the plane x, 
of slope 2(Aoo — !)• Only below this green line can the above infinitesimal copy-cat 
strategy be profitable. We therefore expect markets to operate above this line and 
below the red line of slope 2Aoo- Note however that market orders below the 2Aoo 
line are not necessarily favorable in practice, since the cost for executing a series 
of market orders (which is the typical situation faced by large investors, since 
the outstanding liquidity is, as noted above, always quite small) must include 
the impact of past trades and this increases their average cost. Hence, the slope 
of the effective zero-cost line for a series of market orders is indeed smaller than 
2Aoo- Similarly, the long-time impact of an isolated market order, uncorrelated 
with the order flow, is in fact very small [15]. These isolated market orders thus 
also have a positive cost, equal to half the spread. The only way to beneflt from 
the average impact TZi is to free-ride on a wave of orders launched by others, as 
in the above copy-cat strategy. Let us now take the complementary point of view 
of limit orders and determine the region of profltable market making strategies. 




< Rj V >y< V >^ 



Figure 2: General "phase diagram" in the plane x = {vTli{v)) / (v) , y = {vS)/{v), 
showing several regions: (i) above the red line of slope 2Aoo, market orders are costly (on 
average) and market making is profitable; (ii) below the blue line of slope ~ 2/(1 — Ci), 
limit orders are costly and no market-making strategy is profitable; (iii) above the black 
line of slope 2A^, market making on time scale T (or faster) is profitable (PMM); (iv) 
below the green line of slope 2(Aoo — 1), copy-cat strategies can be profitable (PCC). 
Since neither market orders nor liquidity providing should be systematically penalized 
for markets to ensure steady trading, we expect that markets should operate in the 
'neutral wedge' in between the blue and the red line. Competition between liquidity 
providers should push the market towards the blue line. Since copy-cat strategies 
should not be profitable either, the PCC green line cannot lie above this blue line. 
Note that the blue, red and black lines all coincide within the mrr model. 
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2.4 An infinitesimal market making strategy 

Our aim is to discuss the profitability of providing liquidity to the market for- 
malizing the idea of infinitesimal strategies used in the previous section. To do 
so we compute the gain of a simple market making strategy which consists in 
participating to a vanishing fraction of all trades through limit orders. The sim- 
plest strategy is to consider a market maker with a certain time horizon T who 
provides an infinitesimal fraction ip of the total available liquidity. As illustrated 
by Eq. ( |T71) . the cost incurred by the market maker comes from market impact: 
the price move between and T is anti-correlated with the accumulated posi- 
tion. When the crowd buys, the price goes up while the market making strategy 
accumulates a short position which would be costly to buy back at time T, and 
vice-versa. More precisely, we consider a steady- state market making strategy 
(which avoids explicit unwinding costs). The strategy is such that volume of- 
fered dynamically depends on the accumulated position, which insures that the 
inventory is always bounded. We choose the tendered fraction to be given by: 
= V^o(l + C(Vie), where Vi is the (signed) position accumulated up to time i~ , 
and e = +1 for orders placed at the ask and e = — 1 for orders placed at the bid. 
This mean-reverting strategy insures that the typical position is always bounded. 
One can now use this strategy for an arbitrary long time T; its profit & loss is 
simply given by: 

T-l ^ 

Ol=Y1 '^i(^i'Vi{mi + ei^). (20) 

j=0 ^ 

For large T, one can replace this expression by: 

S- 

Gl = T{ipitiVi{mi + t~)) (21) 

with 0{T^) corrections due to the residual position at T. Discarding the con- 
straint v^i > and neglecting volume- volume correlations, which are much smaller 
than sign-sign correlations [l5l[T6], we finally find: 



Ql{(3) (vS) 



1 — /? °° 
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where j3 = 1 — apoiv) fixes the typical time scale of the market making strategy. 
The above expression is exact in the limit a — > 0, and only approximate otherwise. 
When /? — >■ (fast market making), Eq. fl22p reduces to: 

whereas /3 — >■ 1, corresponding to slow market making, yields: 

g^iP ^ 1) {vS) {vTZM) 



Tpo{v) 2{v) {v) 



(24) 
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Setting Gl{P) to zero leads to a linear relation between spread and impact: 



Using the empirical shape of TZc and C(£), the slope 2 A/3 is found to increase 
between ^ 2/{l — Ci) and 2Aoo when (3 increases. Contrarily to market orders 
which benefit from the growth of the impact TZi with time, slow market making 
is suboptimal. When /5 — > 1, A/? ^ Aoo and the lower limit of profitability 
of very slow market making is precisely the red line of Fig. 2 where market 
orders become profitable. Faster strategies correspond to smaller values of A/3, 
closer to 1/(1 — Ci), leading to an extended region of profitability for market 
making. From the assumption that the above market making strategy for any 
value of j3 should be at best marginally profitable (since one might find more 
sophisticated strategies, which take full advantage of the correlations between 
signs and volumes), we finally obtain the following bound between spread and 
impact: 

{vS) ^ 2 {vn,{v)) ^ ^26) 



(v) - 1 - {v) 

defining the blue line of slope 2/(1 — Ci) in the x, y plane of Fig. 2. Consistently 
with the MRR model, when Aoo = 1/(1 — Ci), the blue and red line of Fig. 2 
exactly coincide. Using that fact that 7?."^ < 7?.^" a simple generalisation of 
the argument presented at the end of Sect. 2.1 allows one to show that the cost 
of limit orders is indeed negative above the blue line. 



2.5 Theoretical analysis: conclusions 

Eqs. fll8|19|26l) and the resulting microstructural "phase diagram" of Fig. 2 
are our central results. These equations show that the cost or profitability of 
infinitesimal market and limit order strategies can be estimated from empirical 
data alone, without having to make any further assumption on the fraction of 
informed trades, the correlation between trades, etc. In order to proceed, we 
made two approximations. Firstly, we assumed that these strategies could be 
made infinitesimal, which allows us to neglect their impact on the price dynamics. 
In practice, trades occur in discrete volume, and strictly speaking the assumption 
of infinitely small volumes does not hold. However, the volume of typical trades 
is much larger than the minimum size, which suggests that this approximation 
is accurate. Secondly, we neglected all direct transaction costs, which obviously 
affect profitability. These costs are in general very small compared to the spread, 
and can therefore also reasonably be neglected. 

Our main result is that profitability, perhaps surprisingly, depends on the 
frequency of these strategies, a result closely related to the anomalous time de- 
pendence of the impact function. Market orders are favored at low frequencies. 
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when impact has fully developed, whereas limit orders are favored at high fre- 
quencies, where impact is still limited and the execution probability significant. 

Our analysis delineates, in the impact-spread plane, a central wedge bounded 
from above by a slope 2Aoo and from below by a slope ~ 2/(1 — Ci), within which 
both market orders and limit orders are viable. In the upper wedge, market 
orders would always be costly and would be substituted by limit orders. In the 
lower wedge, market making strategies, even at high frequencies, would never eke 
out any profit. Such a market would not be sustainable in the absence of any 
incentive to provide liquidity. But if the spread happened to fall in this region, 
the enhanced flow of market orders would soon reopen the gap between bid and 
ask. 

Our next assumption is that simple statistical strategies must have negative 
or marginal profit. These is quite reasonable since high-frequency strategies carry 
relatively small risks. Applying this idea to market making strategies, we con- 
clude that competition between liquidity providers will push the spreads close to 
the lower limit, corresponding to the blue line of slope ~ 2/(1 — Ci) in Fig. 2. 
Now, since market taking (copy-cat) strategies should not be profitable either, 
the green line of slope 2(Aoo — 1) should necessarily lie below the blue line, leading 
to the following inequality on the asymptotic impact enhancement factor Aoo: 

1 < Aoo < 1 + (27) 

where the lower bound comes from the existence of correlation between trades 
(see Eq. ([7])). In other words, the impact function cannot grow more than roughly 
twice its initial value, otherwise statistical arbitrage would set in. Interestingly, 
our data is compatible with the above bound; in practice the blue and green lines 
turn out to be not very far from each other. 

Finally, we note that market microstructure studies insist on large inventory 
risks being an important determinant of the bid-ask spread. However, large 
inventories correspond to long horizons and slow market making. Our analysis 
above shows that accumulating inventories on a long horizon is not only risky, 
but may also be extremely costly on average. When Aoo > '^/{'^ ~ Ci), market 
making on large horizons is significantly more costly than on short horizons, by 
an amount proportional to the spread itself. This is a very strong effect, which 
makes the existence of low- frequency market makers very unlikely. Therefore, 
inventory risk by itself should not be important in determining the value of the 
spread, at least on electronic markets. 

In conclusion, we expect that electronic markets should operate in the vicinity 
of the blue line of Fig. 2, imposing a linear relation between spread and market 
impact of slope close to 2/(1 — Ci). This is what we test on empirical data in the 
following section. 
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3 Comparison with empirical data 



3.1 Small tick electronic markets 

We first consider small tick electronic markets, such as the Paris Stock Exchange 
(pse) or Index Futures. The case of large tick stocks is different since in this case 
the spread is (nearly) always one tick, with huge volumes at both the bid and 
the ask. The case of such markets will be considered below. 

We studied extensively the set of the 68 most liquid stocks of the PSE during 
the year 2002. The summary statistics describing these stocks is given in Ap- 
pendix 2. From the Trades and Quotes data, one has access the the bid-ask just 
before each trade, from which one can obtain the sign and the volume of each 
trade (depending on whether the trade happened at the ask or at the bid) and 
the mid-point just before the trade. From this information, one computes the 
quantities of interest, such as the instantaneous impact function TZi, the one-lag 
correlation Ci, the spread S and Aoo- Note that we have removed 'block trades', 
which appear as transactions with volumes larger than what is available at the 
best price that are not followed by a change of quotes. Clearly, these block trades 
are outside the scope of the above arguments; in any case they represent typically 
a 5 — 10% fraction of the total number of trades and do not significantly affect 
the following results. 

We test the above ideas in two different ways - for a given stock across time, 
and across all different stocks. Since both the spread and impact vary with 
time, one can measure 'instantaneous' quantities by averaging for a given stock 
{Sv) / {v) and {vlZiiv)) / {v) over a number of successive trades. In the example 
of Fig. 3, each point corresponds to an average over 10000 non overlapping 
trades, corresponding to 2 days of trading in the case of France Telecom in 2002. 
Doing so we obtain quantities that vary by a factor 5 that allows us to test the 
linear dependence predicted by Eqs.f ll9|l26l) . For France Telecom, we find that 
Aoo is close to the average value 1.85 shown in Fig. 1. Therefore 2(Aoo — 1)^2 
in this case, meaning that copy-cat market making strategies are impossible, as 
expected for highly liquid stocks. We also find that Ci ~ 0.14 (see Appendix 
2). Our results shown in Fig l3] are in good agreement with the above theoretical 
bounds, even for averages over rather short time scales. A linear fit with zero 
intercept gives a slope equal to 2.14, to be compared with 2/(1 — Ci) ~ 2.32, 
meaning that providing liquidity is hardly rewarded at all for this very liquid, 
small tick stock. In fact, if the intercept of the linear fit is left free, its value 
(which should equal the 'processing costs' 20 in the mrr model) is found to be 
slightly negative. 

We also test Eq. (l2^ cross- sect ionally in Fig. 4, using the above 68 different 
stocks of the PSE. The relative values of the spread and the average impact also 
varies by a factor 5 between the different stocks, which enables to test the linear 
relations fll9|26l) . Once again we find a good agreement with the predicted bound. 
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• FTE, 10000 trades 
Regression, y= 2.14 x 
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Figure 3: France Telecom in 2002. Each point corresponds to a pair {y = {vS)/{v), 
X = {vTli)/{v)), computed by averaging over 10000 non overlapping trades (~ two 
trading days). Both quantities are expressed in basis points. We also show the different 
bounds, Eqs. (jl8ll9l26p . and a linear fit that gives a slope of 2.14. The correlation is 
i?2 = 0.93. 



and the linear fit with zero intercept gives a slope of 2.86, while (2/(1 — Ci)) ~ 
2.64. Hence, fast market making strategies are on average weakly profitable on 
the PSE. However, the intercept of a two-parameter regression is very slightly 
negative, showing that no order processing costs component can be detected on 
these fully electronic markets. 

It is also interesting to analyze small tick Futures markets, for which the 
typical spread is ten times smaller than on stock markets. We have studied a 
series of small tick Index Futures in 2005 (except the MIB for which the data 
is 2004), again both as a function of time and across the 7 indexes of our set. 
For most contracts, the value of Ci is quite large ((Ci) ~ 0.42) except for the 
HANGSENG where Ci ~ 0.035. Results are shown in Fig. [Sj the bounds are again 
quite well obeyed both across contracts and across time, even when the time 
averaging is restricted to only 1000 consecutive trades. This shows that on these 
highly liquid contracts, where the transaction rate as high as a few per second, 
the equilibrium between spread and impact is reached very quickly. 

3.2 NYSE stocks 

The case of the NYSE is quite interesting since the market is still ruled by special- 
ists, who however compete to provide liquidity with other market participants 
placing limit orders. We again test Eqs. fll8f26l) cross sectionally, using the set of 
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Regression, y= 2.86 x 
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Figure 4: 68 stocks of the Paris Stock Exchange in 2002. Each point corresponds to 
a pair (y = {vS)/{v), x = {vTli)/{v)), computed by averaging over the year. Both 
quantities are expressed in basis points. We also show the different bounds, Eqs. 
P8I19I26I) . and a hnear fit that gives a slope of 2.86, while (2/(1 - Ci)) ^ 2.64. The 
correlation is = 0.90. 
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Figure 5: Small tick Index Futures in 2005: CAC, DAX, ftse, ibex, mib, smi, 
HANGSENG. Each black square corresponds to a pair {y = {vS)/{v), x = {TZiv)/{v)), 
computed by averaging over the year, while small crosses are computed by averaging 
over 1000 non overlapping trades on the hangseng futures. Both quantities are ex- 
pressed in basis points. We also show the bounds, Eqs. (|26ll8p . with 1/(1 — Ci) 1 
(dotted blue line), corresponding to the hangseng, and 1/(1 — Ci) 1.72 (full blue 
line), corresponding to the average over all other futures. 
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the 155 most actively traded stocks on the NYSE in 20050 We use the quoted 
bid-ask posted by the speciahst. We have first determined the average impact 
function R{i), which has a shape roughly similar to Fig. 1, although the asymp- 
totic plateau value is slightly larger, leading to Aoo ~ 2.1. On the other hand, 
1/(1 — Ci) is also slightly larger, equal to 1.39. 

Plotting the data in the spread-impact plane, we now find that the empirical 
results cluster around to the upper red line limit where market orders become 
costly. The regression has a si gni ficantly larger slope of 3.3 and now a positive 
intercept 20 ^ 1.3 basis pointso This suggests that, perhaps not surprisingly, 
the existence of monopoly rents on NYSE: market makers post spreads that are 
systematically over-estimated compared to the situation in electronic markets, 
with a non-zero extrapolated spread 20 for zero market impact. This result is in 
agreement with the study of Harris and Hasbrouck performed in the early 90's on 
the NYSE pn], which showed that limit orders were more favorable than market 
orders, and also with Handa and Schwartz [ID] , who showed that pure limit order 
strategies were indeed profitable. On the other hand, the value of the regression 
slope on the purely electronic PSE show that pure limit order strategies can only 
be marginally profitable. We have checked that using the traded spread instead 
of the quoted spread does not change appreciably the above conclusions. 

3.3 The case of large tick electronic markets 

A priori, the string of arguments leading to Eq. (126|) does not directly apply in 
the case where the tick size is large. In that case the spread S is most of the 
time stuck to its minimum value, i.e. one tick, while the size of the queue q at 
the bid and at the ask tends to be extremely large (see e.g. Appendix 2, Table 
3). Because of the large value of the spread, limit orders appear to be favorable, 
but huge limit order volumes accumulate as liquidity providers attempt to take 
advantage of the spread. The size of the queue q at the bid or at the ask is thus 
much larger than the typical value of the traded volume at each transaction v. 
v/q ~ 0.01 (see Table 3), to be compared with v/q ^ 0.2 — 0.3 (see Appendix 2, 
Table 2) for smaller tick stocks. Therefore, the simple market making strategy 
considered above, which assumes that one can participate to a small fraction 
of all transactions, cannot be implemented. We thus expect that the spread 
on these markets will be substantially larger than predicted by the bound Eq. 
( 12^ . because the competition between liquidity providers, that acts to reduce 
the spread, cannot fully operate. We indeed find that the ratio between {vS) and 
(vTZi) is large for large tick stocks. For example, in the case of Ericsson, during 
the period March-November 2004, for which the tick size is ~ 50 bp, we find 

^■^The list of the 155 names is available on request. 

-"^"^Tliis is five times smaller than the average spread, leading to 0/6' ^ 0.25, much smaller 
than the result (j>/9 ^1 — 2 found within mrr model in 1990, or a similar value reported in 
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Figure 6: 155 stocks of the nyse 2005. Each point corresponds to a pair {y = {vS) / (v), 
X = (vTZi) / {v)), computed by averaging over the year. Both quantities are expressed in 
basis points. We also show our bounds, Eqs. (|18ll9l26p . The data shows clearly that 
market orders are less favorable than in the electronic Paris Bourse. The regression 
now has a positive intercept of 1.3 bp with an = 0.87. 



(vS) / (vTZi) ~ 4.5. However, we also find on the same data that Aqo ~ 4.5 ± 1., 
meaning that market orders are in fact not systematically unfavored in these 
large tick electronic markets. In fact, all data points are found to lie between our 
bounds, Eqs. ( fT8]l26l) . but indeed significantly higher than the blue line of Fig. 2 
in this case. 

3.4 Comparison with empirical data: conclusion 

Our empirical analysis shows that on liquid markets, an approximate symmetry 
between limit and market orders indeed holds, in the sense that neither market 
orders nor limit orders are systematically unfavorable. Markets operate in the 
'neutral wedge' of Fig. 2. 

For fully electronic markets, competition for providing liquidity is efficient 
in keeping the spread close to its lowest value, marginally compensating impact 
cost. There is therefore hardly any room for market making strategies. Although 
the cost of isolated market orders is found to be negative, the empirically estab- 
lished proximity of the blue and green line in Fig. 2 means that there is no room 
for simple market taking strategies either. In this discussion, time horizon and 
long range correlations in the order fiow play an important role, overlooked in 
previous studies [S], [HI IH] : somewhat paradoxically, liquidity providers as a whole 
offer average negative costs to market orders but high frequency market making 
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strategies still manage to get (marginally) compensated. Our analysis shows that 
the ecology between liquidity takers and liquidity providers turns out to be con- 
siderably more complex than anticipated by Handa & Schwartz [38]: when costs 
are computed on large time scale, limit orders are in average costly. This implies 
that a significant fraction of limit orders cannot be due to market makers, since 
limit orders as a whole are in arrears. The common assumption that limit orders 
can be attributed to liquidity providers compensated by the spread cannot be 
correct in electronic markets. This argument can only concern a small fraction of 
high-frequency market makers, whose existence is nevertheless crucial to prevent 
liquidity crises. 

On the NYSE, spreads appears to be significantly larger: isolated market orders 
are now marginally costly. A linear relation between spread and impact still 
applies, albeit with a larger slope and a residual intercept, corresponding to 
market maker monopoly rents, which are absent in electronic markets. 

4 Liquidity vs. volatility 

4.1 Theoretical considerations 

Consider again the mrr model discussed above, which predicts a simple relation 
between volatility and impact, Eq. (Q. Using the relation between spread and 
impact estabhshed above, this suggests a direct link between volatility per trade 
and spread, which we motivate and test in this section. 

By definition of the volatility per trade af = ((m^+i — m^)^) and of the 
instantaneous impact ri j = (mj+i — mi).ei, one has as an identity: 



The instantaneous impact ri j is expected to fiuctuate over time for several rea- 
sons. First, the volume of the trade, the volume in the book and the spread 
strongly fiuctuate with time. For example, on the PSE, the spread has a distribu- 
tion close to an exponential, hence one has (S"^) ~ 2(5")^ (see Table 2, Appendix 



Large impact fiuctuations may also arise from quote revisions due to addi- 
tion or cancellation of some limit orders. Second, there might also be important 
news affecting the 'fundamental price' of the stock. These result in large, instan- 
taneous jumps of the mid-point, unrelated to the trading activity itself. In order 
to account for both effects, we write, generalizing the above mrr relation: 



where TZi = (JZi{v)) is the average impact after one trade, a is a coefficient 
measuring the variance of impact fiuctuations and T? is the news component of 

^^The distribution appears to be a power-law on other markets |33j . but this is irrelevant for 
the following discussion. 




(28) 




(29) 
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the volatility (see Section 2.1). A specific model for Eq. (l29l) was worked out in 
[T5] . and tested on France- Telecom (see also [B]). Here, we establish that this 
relation holds quite precisely across different stocks of the PSE, with a correlation 
of = 0.96 (see Fig. ([71)). Perhaps surprisingly, the exogenous 'news volatility' 
contribution is found to be small. (The intercept of the best affine regression is 
even found to be slightly negative). This could be related to the observation made 
in Farmer et al. [42j that for most price jumps, some limit orders are canceled to 
slowly and get 'grabbed' by fast market orders, which means that most of these 
events are already included in TZi, in line with our general statements on the 
approximate symmetry between limit and market orders0 In the following, we 
will therefore neglect S^, as suggested by Fig. ([7]): in this sense the volatility of 
the stocks can be mostly attributed to market activity and trade impact. This is 
in agreement with the conclusions of Lyons and Evans on currency markets 
see also the discussion in [151 [31] . 



i PSE 2002 
— Regression; y= 10.9 x 
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Figure 7: Plot of vs. TZi, showing that the Unear relation Eq. ([29[) holds quite 
precisely with = and a ~ 10.9. (The intercept of the best affine regression is even 
found to be slightly negative). Data here corresponds to the 68 stocks of the pse in 
2002. The correlation is very high: = 0.96. 



Our final assumption is that of universality, i.e. when the tick size is small 
enough and the typical number of shares traded is large enough, all stocks within 
the same market should behave identically up to a rescaling of the average spread 
and the average volume. In particular we assume that the statistics of (i) the vol- 

^^One could argue that our results simply show that the news volatility S itself is proportional 
to TZi and thus to the spread S. However, there is no reason why this should a priori be the 
case. For example, a model where jumps of typical amplitude J have a small probability per 
trade p leads to S = y^J, whereas the cost of such jumps, contributing to S, is pJ <^ S. 
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ume of market orders (ii) the spread S and (iii) the impact TZ, and the correlations 
between these quantities are independent on the stock when these quantities are 
normalized by their average value 1^ This universality implies that: 

{vS) = h{v){S), (30) 

where h is stock independent. Similarly, 

{vni{v)), = h'{v)ni, (31) 

where h' is also stock independent. Note that this assumption is consistent with 
the empirical observation of [U], where the impact function TZi{v) for different 
US stocks can indeed be rescaled onto a unique Master curve by a proper scaling 
of both the x and y axis. We test Eqs. fl30|3ip in Fig. [8] in the case of the Paris 
Stock Exchange, from which we extract h 1.02 and h' 1.80. Interestingly, we 
find that the volume and the spread are nearly uncorrelated (6 = 1), whereas the 
volume traded and the impact are correlated {h' > 1), as expected. 
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Figure 8: Plot of {vX)y/{v) vs. {X), where X is either the spread S or the instan- 
taneous impact TZi{v) (multiplied by a factor 5 for clarity). The quality of the hnear 
regression tests our universality assumption, which is excellent for S {B? = 0.98) and 
satisfactory for IZi {R^ = 0.9). The value of 6 1.02 and b' 1.80 are given by the 
slope of these regressions. Data here corresponds to the 68 stocks of the pse in 2002. 

Therefore, using Eq. fl26l) as an equality (as suggested by the empirical results 
of Section 3), and Eqs. ( I29|30f31l) . we obtain the main result of this section: 

(5) = ccri, (32) 
-'^''The universality of the shape of the order book was indeed checked to hold rather well in 
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where c is a stock independent numerical constant, which can be expressed using 
the constants introduced above as c = 2\h' / y/ab. This very simple relation 
between volatility per trade and average spread was noted in [151 Elj, and we 
present further data in the next section to support this conjecture. Therefore, 
the constraints that (i) optimized high frequency execution strategies impose that 
the price is diffusive (see [151 IHDj and (ii) the cost of limit and market orders 
are nearly equal [Eqs. fll8|26l) ]. lead to a simple relation between liquidity and 
volatility. As an important remark, note that the above relation is not expected 
to hold for the volatility per unit time a, since it involves an extra stock-dependent 
and time-dependent quantity, namely the the trading frequency u, through: 

(7 = (Tiy/u. (33) 

We will discuss this issue further in Section 5. 
4.2 Comparison with empirical data 

Using the same data sets as in Sections 3.1 and 3.2, we now test empirically 
the predicted hnear relation between spread and volatility per trade, Eq. ( 132|) . 
The average spread (S) is defined as the average distance between bid and ask 
immediately before each trade (and not as the average over all posted quotes). 
The volatility per trade is defined as the root mean square of the trade by trade 
return@ Our results for the Paris Stock Exchange are shown in Figs[9]and[T0l We 
see that Eq. (15^ describes the data very well, with i?^s over 0.9. Interestingly, 
using the results obtained above across the PSE stocks, we have a' ^ 10.9, b ~ 
1.02, b' ~ 0.53, A ~ 1.43, leading to c ~ 1.53, in close correspondence with the 
direct regression result c ~ 1.58. Similar results are obtained for Index futures 
(Figs. [TTI-a & b) or for the NYSE (Fig. [T2l) . with values of c which are all very 
similar c ~ 1.2 — 1.6. We have also checked that there is an average intra-day 
pattern which is followed in close correspondence both by (S) and ai: spreads 
are larger at the opening of the market and decline throughout the day. Note 
that the trading frequency z/ increases as time elapses, which, using Eq. (133|) . 
explains the familiar U-shaped pattern of the volatility per unit time. 

5 Discussion and conclusion 

The main theoretical result of this paper is the possibility to express the cost of 
market orders and the profit of infinitesimal market-making/taking strategies in 
terms of directly observable quantities, namely the spread and the lag-dependent 

^^Since prices are very close to random walks, defining the volatility from returns defined 
on a longer time scale gives very similar results. On our set of PSE stocks, we find that 
(Ti28/\/128 0.84cri, indicating a small anti-correlation of returns (^ 15%) of short time 
scales. 
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Figure 9: Test of Eq. (j32p for France Telecom in 2002. Each point corresponds to 
a pair ((5), fii), computed by averaging over 10000 non overlapping trades (~ two 
trading days). Both quantities are expressed in basis points. From a linear fit, we find 
c 1.69 with i?2 = 0.90. 
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Figure 11: Test of Eq. (i32]l for the hangseng futures contract (triangles), and across 
small tick Index Futures in 2005: GAG, DAX, ftse, ibex, mib, smi, hangseng (squares). 
Each point corresponds to a pair ((-S), o"i), computed by averaging either over 1000 
non overlapping trades (triangles) or over the whole year (squares). From a linear fit, 
we find c « 1.53 for the hangseng across time and c ~ 1.17 across Index Futures. 
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Figure 12: Test of Eq. (f32]l for stocks from the NYSE in 2005. Each point corresponds 
to a pair ((5), cJi), computed by averaging over the entire year. Both quantities are 
expressed in basis points. From a linear fit, we find c ~ 1.32, , with = 0.91 
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impact function. Imposing that any market taking or liquidity providing strate- 
gies is at best marginally profitable allows one to define viable regions of the 
microstructural "phase-diagram" (Fig. 2) where electronic markets should op- 
erate, and suggest a linear relation between spread and instantaneous impact. 
This relation is in good agreement with empirical data on small tick contracts, 
with a slope compatible with marginal profitability of both fast market making 
and copy-cat market taking strategies. Somewhat paradoxically, we find that 
liquidity providers as a whole offer average negative costs to market orders al- 
though high frequency market making strategies still manage to get (marginally) 
compensated. Our analysis allows us to compare in an objective way the spreads 
in different markets and suggests that spreads are distinctly larger on the NYSE. 
Note that our analysis does not require any model specific assumptions such as 
the nature of order flow correlations or the fraction of informed trades. In fact 
our results hold even if trades were all uninformed but still mechanically impact 
the price. 

Making reasonable further assumptions, we have then shown that spread S 
and volatility per trade ai are also proportional, a result that we confirm empir- 
ically with correlations above 0.9. This very simple relation means that most of 
the volatility comes from trading alone, and suggests that the bid-ask spread is 
dominated by adverse selection, provided one considers the volatility per trade as 
a measure of the amount of 'information' included in prices at each transaction. 
There are indeed two complementary economic interpretations of the relation 
0"! ~ S" in small tick markets: 

• (i) since the typical available liquidity in the order book is quite small, 
market orders tend to grab a significant fraction of the volume at the best 
price; furthermore, the size of the 'gap' above the ask or below the bid is 
observed to be on the same order of magnitude as the bid-ask spread itself 
which therefore sets a natural scale for price variations. Hence both the 
impact and the volatility per trade are expected to be of the order of S, as 
observed; 

• (ii) the relation can also be read backward as S* ~ ai: when the volatility 
per trade is large, the risk of placing limit orders is large and therefore the 
spread widens until limit orders become favorable. 

Therefore, there is a clear two-way feedback that imposes the relation ai ~ S, 
valid on average; any significant deviation tends to be corrected by the resulting 
relative flow of limit and market orders. Our result therefore appears as a funda- 
mental property of the markets organization, which should be satisfied within any 
theoretical description of the micro-structure. Zero intelligence models [32], or 
bounded-range models [26l [30l El] fail to predict any universal relation between 
S and (Ti. 
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Our relation involves the volatility per trade whereas most of the econometric 
work has instead focused on the volatility per unit time a. The relation between 
the two involves the trading frequency z/, which is itself both time- and stock- 
dependent. As a function of time, we find, in agreement with [38], that volatility 
per trade and trading frequency are positively correlated; the volatility a = 
aiy/u therefore increases because both ai and u increase^ Across stocks, on the 
other hand, the volatility per unit time exhibit only weak systematic variations 
with capitalization C: a ~ C"^ with ^ 0, whereas the trading frequency 
increases with capitalization as u ~ C^. For stocks belonging to the ftse- 
100, Zumbach finds ( ~ 0.44 [21], while for US stocks the scaling for u is less 
clear [19] . Interestingly, our result then leads to a result between average spread 
and capitalization of the form S ~ C"^~^/^ ~ C"*^'^^, in good agreement with 
Zumbach's data [21], with the impact data of Lillo et al. |45j and with our own 
data on the PSE. 

The fundamental question at this stage is to know what fixes the volatility a 
and the trading frequency u. Clearly, the trading frequency has to do with the 
available liquidity and the way large volumes have to be cut in small pieces. But 
is the volatility per unit time the primary object, driven by a fundamental process 
such as the arrival of news, to which the volatility per trade and therefore the 
spread is slaved? Or is the market micro-structure and trading activity imposing, 
in a bottom-up way, the value of the volatility? Understanding these coupled 
dynamical problems appears to be a major challenge for the theory of financial 
markets, and an unavoidable step to understand the interrelation between order 
flow and price changes, and liquidity and market efficiency [HI [43], [20l [211 EZl flSl 
[El [50]. 

We want to warmly thank S. Bogner, J. D. Farmer, Th. Foucault and G. 
Zumbach for important and useful discussions. We also thank the referees for 
very constructive remarks, which helped improving the manuscript. 

Appendix 1: Impact and volatility in the MRR 
model 

From the basic equation determining the dynamics of the mid-point, 

rrii+i -mi= pi+i - Pi - 9p{ei - ei„i) =^i + 9(1 - p)ei, (34) 

^^The long-memory property of a is argued in [47j to be related to long range correlation in 
the trading frequency rather than in the volatility per trade, but see [23] . 
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one gets: 

i+e-i i+e-i 
rrii+i -mi= J2 + ^(1 ~ P") (^5) 

j=i j=i 

Therefore, taking into account the correlation between es, and the assumption 
that external shocks are uncorrelated with the order flow, the impact function is: 

e-i 

n, = (e,(m,+, - mi)) = 0(1 - p) ^ p'' = 0(1 - p'). (36) 

l'=Q 

Note that in this model, the 'bare' impact function G'o(^) defined in [151 fT7] 
through: 

E E Go(^-j-l)e,-, (37) 

j=—oo oo 

is here found to be constant, equal to Gq{£) = 6{1 — p)- Finally, one finds: 

al = ((m,+i - m,f) = Y? + e\l - pf (38) 

and 

,3,) 

More generally, assuming that only the sign surprise matters, one can write, for 
arbitrary correlations between signs: 

i+l-l i+e-1 

TJii+e -mi= + ^ - (40) 

j=i j=i 

where the last term is the conditional expectation of the next sign. The impact 
function now generalizes to: 

7^, = [1 - C(£)] , (41) 
and therefore Aoo = 1/(1 — Ci). 

Appendix 2: Summary statistics 
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1\J'3 TTIP 
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1 N dlllt/ 


ACA 


(Irpfi if A QTi pol p 


IPG 


TTiTn0"rp mp^ T^jnf prf a imnPTif 

Xllliytil dlllCo 1 J 1 1 1 1 ^ 1 u dllllllv^ll u 


AC 


A ppor 


LG 


T Ti^rcfp 

l_idldl 


AF 


Air FVj^Tipp-TCT,A/r 

jT.11 J. 1 (Xllv/C^XvJ-iiVl. 


LT 


t\\ PT^l PTTP 
± vie L/lCl 1 \j 


AGF 


A^'^iTrj^npp*^ r^PTiPTj^lp^^ Hp T^Vj^tipp 


LY 


Siipt; 


AI 


Air T linn inp 


MC 


LVMH 

1 i V 1V1.± ± 




Alstnm RGPT 

Ji.loljVjl±l ±V-V_T1. J. 


MT, 


A/Tipn p1i ri 

1V11L.1±Lj1111 


ALT 


A If r^i n 

ill yjL Clll 


MMB 

1V1.1V1.1_J 


T 1^1 cf^irrl prp 

Udcldl LJ.^_.l C 


AVE 


A vpn f i 


MMT 


IVTfi-lVTpf ronol p TpI pvi ^\ on 

iv±u iv±^ tl w jy wiv; -L v^iv; V loiwii 


BB 


Qoriptp BTC 


NAD 


V V Ct l±Ct III J I 




( -! 1 Y » 1 n ^ ( ^ > 1 1 u » 1 1 ( ^ 


1 > J. Y 


1 1 1 1 PI'A''^ 


CAP 


\,i^T\ (-rPinini 

Vw/CX VJ t'lllllll 


OGE 

\y \jx j-j 


()vf\ n P"p 

vyi diiCjt; 


CA 


(; o rrpfm 1 r 


OR 

vy 1. L 


Tl Orp^il 
i_j vyicdi 
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( jiirmfifiri T)ior 

\_yl±l lo Lj Idll J^lKJl 
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Ppp n 1 ri PAT" 

J. C;L.lilliL^ y 


CGE 


Alrafpl 


pp 


PPR 


CK 


(IcioiTiri (-rnipnj^TTi 1 nrPT i 

KJCliij±±l\J VJ Lllv-'llcU U. I L/1^1. 1 


PUB 


PnKlipi^ r^rnnrip 

i LlkJllv-'lO VJIULIL/C 


CL 


r^TpH if T (VnTi'npii*5 

Vyl Cvll u J-iVUllllCUO 


RF 


l^ln "Ti^Tipn 

1 J LAJ- dZj VvVJ 


CNF 


A c;c;nr/^ Tl ppc; 

Vw/l 1 1- Lil dlH^V^O 


RHA 


T? n on i ^1 

1. Lll'JLlld 


CO 


(jQCTno (-rmpnarrl 
v-^doiiiw v_T Lin^iidi v_i 


RT 

J. LI. 


PpFTion- iPi^rrl 

L C-lliwLl l.LlL.dlLl 


cs 


AXA 


RNO 

1. LI N V / 


R PTI 11 If 
1. LClidLLl u 


CTJ 


( ,ln n A/Tpnif PTTi^Tipp 

V_yl U. U IViCLll tCl 1 dliCC 


RXT, 


]-? pvpl 


CY 


r^j^'^f nrj^TTij^ DnKni^ 

V^ClO uWl cUllcl J—/ LI kyUlO 


SAG 


kjdii dii 


DEC 


Tr^ T^ppprnv 


SAN 


Rj^nnfi- A vpTif i<^ 

kJdllUll JiVt/llljlO 


DC 


V 111L.1 


SAX 


Afoc; r)riO"iTi 

il-tVJO V-/1 1^111 


DSY 


T)j^ 11 If SiA/'t;f PTTlPc; 

l_y dOOd Lil u kJ y O uClllCO 


SCO 


SCOR 

kJ vy 1. L 


EE 


T^lssilor Tnf prn/^f ional 


SC 


.Simpo 

kjiiiiv.^'y 


EN 


Bouygues 


su 


Schneider Electric 


FP 


Total 


SW 


Sodexho AUiance 


FR 


Valeo 


TEC 


Technip 


PTE 


France Telecom 


TFI 


Television Francaise 1 


GPC 


Gecina 


TMM 


Thomson 


CLE 


Societe Generale 


UG 


Peugeot 


GL 


Galeries Lafayette 


UL 


Unibail 


HAV 


Havas 


VIE 


Veolia Environnement 


HO 


Thales 


ZC 


Zodiac 



Table 1: Codes and names of the PSE stocks analyzed in Table 2. 
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35 


9.9 
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5.62 
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82.7 
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46.0 
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9.27 


3.04 
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7.07 


GL 


0.9 


43 


13.6 


17 


24.34 


41.34 
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0.29 


1.38 


7.24 
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8.1 


57 


14.6 


139 


21.15 


32.44 


26.53 


7.41 


0.18 


1.91 


17.15 


HO 


11.3 


61 


19.7 


143 
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15.02 


15.86 


3.65 


0.29 


1.94 


2.85 


IPG 


3.2 


24 


4.9 


163 


29.05 


40.35 


32.95 


8.51 


0.15 


2.31 


21.79 


LG 


38.2 


193 


36.6 
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7.82 


13.31 


9.99 


2.90 


0.24 


1.67 


7.33 


LI 


0.2 


50 


15.8 


3 
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29.90 


23.18 


4.69 


0.20 


0.70 
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Code 


Tnv 




(^;) 


# 




{S) 




7^l 


Ci 




Tick 


LY 


58.4 


Ill 


28.8 


507 


8.40 


12.84 


11.94 


3.11 


0.20 


1.97 


4.31 


MC 


52.9 


143 


34.7 


381 


8.01 


1 C\ At 

12.41 


10.75 


3.01 


0.26 


2.20 


A 10 

4.18 


ML 


13.2 


71 


21.2 


156 


10.23 


14.98 


15.36 


3.35 


0.30 


1.93 


2.71 


MMB 


13.3 


76 


18.9 


177 


10.67 


15.96 


15.84 


3.60 


0.25 


1.94 


3.50 


MMT 


0.7 


33 


10.1 


17 


33.82 


49.77 


48.42 


9.71 


0.32 


1.66 


3.57 


NAD 


4.6 


47 


6.0 


188 


14.39 


31.27 


20.91 


A 11 

4.11 


0.17 


1.89 


20.36 


NK 


1.1 


43 


13.4 


21 


25.95 


Ad i~\ 

42.89 


39.34 


7.69 


0.29 


1.40 


8.07 


OGE 


36.4 


182 


21.2 


429 


11.40 


21.53 


13.35 


3.57 


0.16 


1.82 


16.03 


OR 


68.0 


211 


41.9 


406 


7.30 


12.45 


9.45 


2.76 


0.25 


1.39 


6.59 


PEC 


9.5 


110 


31.9 


75 


15.06 


23.71 


24.89 


5.07 


0.24 


2.41 


V A V 

5.45 


PP 


36.8 


154 


31.5 


292 


10.08 


-1 A A 

15.44 


13.25 


3.39 


0.25 


2.23 


7.23 


TIT TT> 

PUB 


11.9 


78 


27.3 


109 


15.08 


21.19 


22.92 


4.97 


0.23 


2.46 


3.85 


RF 


0.1 


25 


7.0 


3 


22.27 


37.26 


32.69 


7.07 


0.23 


0.72 


8.25 


RHA 


2.1 


32 


9.7 


55 


21.45 


33.99 


32.97 


6.71 


0.22 


1.87 


10.83 


RI 


12.6 


138 


39.1 


80 


10.49 


16.82 


16.01 


3.64 


0.22 


1.49 


6.42 


RNO 


35.8 


158 


34.4 


260 


8.01 


12.80 


11.13 


2.71 


0.25 


2.27 


4.38 


RXL 


0.7 


34 


13.0 


14 


31.30 


51.91 


50.49 


9.21 


0.26 


1.50 


6.77 


SAG 


1.5 


36 


10.3 


35 


24.59 


43.15 


42.89 


7.29 


0.22 


1.73 


7.48 


bA^ 


9 1.2 


301 


56. 1 


117 


7.76 


12.18 


8.60 


3.01 


0.25 


1. 19 


7.92 


SAX 


6.0 


57 


20.4 


73 


23.28 


33.48 


33.79 


7.45 


0.27 


2.58 


5.23 


SCO 


2.0 


25 


9.3 


55 


35.67 


38.88 


40.00 


8.37 


0.23 


2.16 


7.40 


sc 


0.5 


55 


13.9 


10 


12.24 


21.08 


19.01 


3.58 


0.22 


1.28 


6.10 


su 


26.2 


129 


33.0 


198 


9.52 


14.60 


13.43 


3.30 


0.26 


2.15 


5.63 


D VV 


1±.Z 


fi7 


1 Q 


1 AA 




1 S 1 Q 




A 9fi 


n "^9 


1 Q9 
± .yz 


99 
o.zz 


TEC 


9.4 


123 


31.6 


74 


16.27 


24.27 


25.77 


5.06 


0.24 


2.18 


7.31 


TFI 


17.4 


63 


21.0 


207 


11.91 


15.87 


16.19 


4.09 


0.25 


2.07 


3.71 


TMM 


28.0 


78 


20.8 


338 


10.08 


16.08 


15.59 


3.18 


0.19 


2.34 


4.29 


UG 


33.2 


141 


36.2 


229 


7.95 


11.91 


10.80 


2.86 


0.26 


2.11 


4.43 


UL 


3.0 


64 


22.9 


33 


14.61 


24.45 


23.47 


4.87 


0.27 


1.41 


8.07 


VIE 


19.8 


77 


24.8 


199 


11.52 


16.60 


17.81 


3.75 


0.23 


2.23 


3.57 


ZC 


0.9 


33 


8.7 


26 


21.95 


11.99 


12.10 


7.28 


0.21 


1.61 


1.21 



Table 2: Pool of the 68 stocks of the PSE studied in this paper, with their summary 
statistics for 2002. The daily turnover is in million Euros, (qt) is the average amount in 
book (bid+ask) in thousand Euros, {v) is the average size of market order (in thousand 
Euros). The total number of trades (in thousand) corresponds to the whole year 2002. 
The volatility per trade ai, the average spread (S), the spread standard deviation as, 
the average response TZi and the average tick size arc all in basis points. Note that 
as ~ {S), characteristic of an exponential distribution of the spread. Note also that 
the volume available at the best prices is ~ 10"'^ of the daily turnover. 
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Code 


Turnover 


(Qt) 


(v) 


# trade 


0"! 


(S) 




7^l 


Tick 


LMEB 


262 


21 


199 


211 


8.1 


16.7 


1.1 


11.5 


16.6 



Table 3: Summary statistics for Ericsson in the period March 2004-November 2004. 
Units arc the same as in Table 2, except {qt) which is now in million Euros. Note that 
{v)/{qt) « 10-2. 
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